Graph Grammar Based Analysis System of Complex Table Form Document
نویسندگان
چکیده
Structure analysis of table form document is important because printed documents and also electronical documents only provide geometrical layout and lexical information explicitly. To handle these documents automatically, logical structure information is necessary. In this paper, we first propose a general representation of table form document based on XML, which contains both structure and layout information. Next, we present structure analysis system based on graph grammar which represents document structure knowledge. As the relation between adjacent fields in table form documents become two dimensional, two dimensional notation is necessary to denote structural knowledge. Therefore, we adopt two dimensional graph grammar to denote them. By using grammar notation, we can easily modify and keep consistency of it, as the rules are relatively simple. Another advantage of using grammar notation is that, it can be used for generating documents only from logical structure. Experimental results have shown that the system successfully analyzed several kinds of table forms.
منابع مشابه
Mapping of McGraw Cycle to RUP Methodology for Secure Software Developing
Designing a secure software is one of the major phases in developing a robust software. The McGraw life cycle, as one of the well-known software security development approaches, implements different touch points as a collection of software security practices. Each touch point includes explicit instructions for applying security in terms of design, coding, measurement, and maintenance of softwar...
متن کاملReliability estimation of Iran's power network
Today, the electricity power system is the most complicated engineering system has ever been made. The integrated power generating stations with power transmission lines has created a network, called complex power network. The reliability estimation of such complex power networks is a very challenging problem, as one cannot find any immediate solution methods in current literature. In this pape...
متن کاملThe Effect of Written Corrective Feedback on the Accuracy of Output Task and Learning of Target Form
The effect of error feedback on the accuracy of output task types such as editing task, text reconstruction task, picture cued writing task, and dictogloss task, has not been clearly explored. Following arguments concerning that the combination of both corrective feedback and output makes it difficult to determine whether their effects were in combination or alone, the purpose of the present st...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملMathematical formula recognition using graph grammar
This paper describes current results of Ofr (Optical Formula Recognition), a system for extracting and understanding mathematical expressions in documents. Such a tool could be really useful to be able to re-use knowledge in scientific books which are not available in electronic form. We currently also study use of this system for direct input of formulas with a graphical tablet for computer al...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003